Description of the document

This is the Opportunity Mapping 2.0 Technical Document produced by Phuong Tseng. The intention is to capture changes and developments in the 2019 version.

The Methodology Document and Spreadsheet

  1. 2019 Opportunity Mapping Indicators and Measures
  2. [OM_methodology_v4_Nov30.pdf] was updated in November 30, 2018.
  3. 2015 - 2019 Opportunity Mapping 2.0 Document
  4. 2014 - 2016 Meeting Notes

Set-up

A. The Domains

In 2019, there are 5 domains: education, economic & mobility, housing and neighborhood, conduit, and social capital. The social capital domain is a new domain in 2019.

1. Education Opportunity Indicators

This year, the education domain added a new indicator called Early Childhood Participation Rate or Pre-K. Another indicator, adult with bachelor’s degree was moved from the education domain to the economic & mobility domain in 2019.

common_fields <- c("fips",
                   "CountyID.x",
                   "TOTPOP.x", "county_name.x")
edu_list <-
  c(
  "math_prof",
  "read_prof",
  "grad_rate",
  "pct_not_frpm",
  "z_math_prof",
  "z_read_prof",
  "z_grad_rate",
  "az_pct_not_frpm",
  "HD01_VD04",
  "HD01_VD03",
  "ratio",
  "ratio2",
  "z_preK"
  )

2. Economic & Mobility Opportunity Indicators

There are a few changes to this domain in 2019. The adult with bachelor’s degree was added to this domain, median household income, and median household value. Other indicators such as the commuting time and entry-level jobs’ measures were changed to TCAC’s measures. A new indicator, school district revenue per capita, was added to capture the extent of municipal hoarding. Due to reliability issues of municipal data, school district boundary was used as a proxy instead.

econ_list <- c(
  "total_pop_2017",
  "below_200_pov_2017.x",
  "moe_below_200_pov_2017.x",
  "pct_below_pov_2017",
  "moe_pct_below_pov_2017",
  "pct_below_200_pov_2017.x",
  "pct_assist_2017",
  "med_hhincome_2017" ,
  "moe_med_hhincome_2017" ,
  "employed_pop_20to60_2017",
  "pct_employed_20to60_2017",
  "home_value_2017" ,
  "moe_home_value_2017",
  "pct_bachelors_plus_2017",
  "above_200_pov_2017",
  "pct_above_200_pov_2017",
  "tot_hh_2017",
  "moe_tot_hh_2017",
  "moe_pct_long_commute_2017",
  "moe_assist_2017",
  "moe_long_commute_pct",
  "long_commute_pct",
  "low_wage_med_distance" ,
  "jobs_lowed" ,
  "rural_flag",
  "az_pct_assist_2017" ,
  "az_pct_employed_20to60_2017",
  "z_home_value_2017" ,
  "z_pct_bachelors_plus_2017" ,
  "az_pct_long_commute_2017",
  "z_jobs_lowed" ,
  "Econ_Domain",
  "z_sdrevpcap",
  "sdrev",
  "sdrevpcap",
  "sd_totpop"
  )

3. Housing & Neighborhood Opportunity Indicators

The housing and neighborhood opportunity domain has two new environmental indicators pulled from CalEnviroScreen (i.e. pm25, lead).

housing_list <-
  c("below_200_pov_2017.y",
  "moe_below_200_pov_2017.y",
  "pct_below_200_pov_2017.y",
  "pm25",
  "pct_pm25",
  "toxRelease",
  "pct_toxRelease",
  "lead_pctl",
  "pct_lead_pctl" ,
  "Grocery",
  "z_Grocery" ,
  "az_Grocery",
  "P_INSURED" ,
  "az_insurance" ,
  "H_Crime",
  "pct_parks",
  "az_pct_below_200_pov_2017",
  "az_pct_below_200_pov_20172",
  "az_pct_pm25",
  "az_pct_toxRelease",
  "az_pct_lead_pctl" ,
  "Housing_Env_Domain",
  "test_azcrime" ,
  "azhealthcare" ,
  "zparks"
  )

4. Conduit

The Conduit domain has two indicators: median broadband download speed and percentage of single-parent households.

conduit_list <-
  c(
  "pct_singleparent_hh_2017.y",
  "moe_pct_singleparent_hh_2017.y",
  "az_pct_singleparent_hh_2017",
  "TOTPOP.y",
  "Median_bb",
  "z_broadband",
  "z_broadband2",
  "Conduit"
  )

5. Social Capital

This is our newest domain, which has the average distance to a religious institution, registered voters voting rate, and average distance to club membership and etc.

socap_list <-
  c(
  "pct_singleparent_hh_2017.y",
  "moe_pct_singleparent_hh_2017.y",
  "az_pct_singleparent_hh_2017",
  "Clubs",
  "AVGDIS_REL",
  "reg_vote",
  "SOCIAL_CAP",
  "z_regvoter",
  "zreligious",
  "zclubs"
  )

6. Compile All Indicators Function

7. Calculate Domains

B. Index Calculation

It is the decision of the analyst to decide whether it makes sense to calculate the index first or after the filtering process. In this case, I decided to calculate the region index of these tracts first and filter the tracts in later steps because it is important to display the scores of these tracts next to its opportunity category for comparison purposes. Our previous analyses show that some tracts may have high index values with high percentage of single-parent households and concentrated poverty.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.757397 -0.130434 -0.006168  0.004954  0.119239  1.180872

C. Filters

Our filters or filtering process consists of two conditions: 1) Poverty (below 200 FPL) >= 30% and Single-parent family >= 30%, OR 2) High Divergence with population of Black and Latinx > 50% and poverty (below 200 FPL) >= 30%. Steps 1 - 3 deals with the first condition while steps 4 - 6 handles the second condition.

1. Filtering Single parent families >= 30%

returns 471 records with 8 NAs

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0000 -1.0000  0.0000 -0.2983  0.0000  0.0000

2. Filtering Poverty (below 200 FPL) >= 30%

returns 418 records with 3 NAs

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0000 -1.0000  0.0000 -0.2647  0.0000  0.0000

3. Filtering Single parent >= 30% AND Poverty (below 200 FPL) >= 30%

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0000  0.0000  0.0000 -0.1849  0.0000  0.0000
## [1] -292

4. High Divergence and population of Black and Latinx > 50%

## [1] -201

5. High Divergence with population of Black and Latinx > 50% and poverty (below 200 FPL) >= 30%

## [1] -201
## [1] -418
## [1] -171
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.0000 -1.0000  0.0000 -0.2647  0.0000  0.0000

6. Final Filter

High Divergence with population of Black and Latinx > 50% and poverty (below 200 FPL) >= 30% OR Poverty (below 200 FPL) >= 30% and Single-parent family >= 30%

## [1] -320

Check filters

Filter Function

D. Categorization with Filters

1. Categorization with Filters

E. Categorization without Filters

Here, I take a slightly different approach with the categorization method. Instead of breaking each category into 25%, I break it down by 20% per category, which means each category will have the same number of records. This is because all of these records are categorized only by its index value rather than filters.

1. Graphs or Charts

Opportunity Index Scores by CBSA

Opportunity Categories by County

Number of Tracts by Opportunity Category and Index Scores

F. Missing Values

These are records with NAs or missing values
1. fips 06081984300 has NaN in pct_pov_below_200 and pct_singleparent_hh
2. fips 06081984300 (Mod) changed to NAs
3. fips 06095253000 has NaN in pct_pov_below_200 and pct_singleparent_hh
4. fips 06095253000 (Highest) changed to NAs
5. fips 06095980000 has NaN in pct_pov_below_200 and pct_singleparent_hh
6. fips 06095980000 (High) changed to NAs

The records below have poverty rate percentages, which is why they’re not changed to NAs to prevent them from not being counted even if they do not have pct_single-parent_ household_hh. These records were categorized based on its index values.
  1. fips 06001981900 has NA in pct_singleparent_hh
  2. fips 06001981900 (High),
  3. fips 06013351101 has NA in pct_singleparent_hh
  4. fips 06013351101 (Mod),
  5. fips 06013351102 has NA in pct_singleparent_hh
  6. fips 06013351102 (Mod),
  7. fips 06013351103 has NA in pct_singleparent_hh
  8. fips 06013351103 (High),
  9. fips 06075980300 has NA in pct_singleparent_hh
  10. fips 06075980300 (High)

Output: Index with Filters

Correlations Exploration

## corrplot 0.84 loaded
## Parsed with column specification:
## cols(
##   .default = col_double()
## )
## See spec(...) for full column specifications.

G. Overlays

Racial and Ethnic Composition Overlay

Data Source: ACS Census data 2010-2014
Description: To analyze the distribution of racial and ethnic composition by opportunity categories, user must first join the two datasets then get the aggregate value of the population for each racial group in each opportunity group.

Median Household Income

Data Source: American Community Survey (5-year-estimates)
Table: B19013_001 – MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)

Payday Lending Overlay

Data Source: ESRI Business Analyst
Spreadsheet: OV_YEAR_Payday
Description: 2017 Measure – Spatially join the payday lending in the bay area shape file to the 2014 census tract shape file with the opportunity categories to obtain the number of businesses per census tract. Then use the count of number of businesses per tract divided by the total count number of payday lending and credit businesses in the Bay Area to obtain the percentage.
2018 Measure – Identify whether the column salevolume in the dataset has the volume of payday loan sales. Aggregate those sales and distribute them to tracts to identify the amount of sales in each neighborhood OR (if it’s possible to) identity where the highest percentage of interests (200-400%) that these payday loans are located and how many of them are in each census tracts.

#load(file="BA_payday_2018.RData")
#proj4string(BA_payday_2018)

Subsidized Housing Overlay

Data Source: HUD subsidized housing projects
Spreadsheet: OV_Year_SubHous
Description:

• Data should be gathered through HUD instead of TCAC. Use the file obtained from HUD to create a point shapefile based on the lat and long for each (which is in the table).
• This table has all subsidized housing projects in California; Use geoprocessing to clip the subsidized housing shapefile to Bay Area
• Analysis of Projects and Units should be included in the map based on subsidized units available and the number of subsidized programs in the region.

Low population density Overlay

Data Source: Census Data
Spreadsheet: OV_Year_LowDen
Description: To analyze the density of the census tract and identify areas that are considered low density with 40 or more acres per person
• Calculate the “area” of each tract in acres. Then I divided that by the number of people, and the results are in POP_DEN field. All tracts which had a value of 40 or above were highlighted on the map with a specific symbology
Example:
Step 1: Create a new field, “Acres_per” person for each tract > Calculate Geometry > selecting Area > Coordinate System: Use Coordinate System of the data frame: PCS: NAD 1983 StatePlane California III FIPS 0403 > Units: Acres [US] (ac) > OK
Step 2: Then, create a new field titled, “POP_DEN” in which the value would be “Acres_per” person for each tract divided by the number of people in the tract > select the tracts that have the value of 40 or above